Overview

Dataset statistics

Number of variables12
Number of observations1599
Missing cells0
Missing cells (%)0.0%
Duplicate rows240
Duplicate rows (%)15.0%
Total size in memory150.0 KiB
Average record size in memory96.1 B

Variable types

NUM12

Reproduction

Analysis started2020-06-02 09:01:36.204399
Analysis finished2020-06-02 09:02:32.968229
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Dataset has 240 (15.0%) duplicate rows Duplicates
citric acid has 132 (8.3%) zeros Zeros

Variables

fixed acidity
Real number (ℝ≥0)

Distinct count96
Unique (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.319637273
Minimum4.6
Maximum15.9
Zeros0
Zeros (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum4.6
5-th percentile6.1
Q17.1
median7.9
Q39.2
95-th percentile11.8
Maximum15.9
Range11.3
Interquartile range (IQR)2.1

Descriptive statistics

Standard deviation1.741096318
Coefficient of variation (CV)0.2092755082
Kurtosis1.132143398
Mean8.319637273
Median Absolute Deviation (MAD)1.360135732
Skewness0.9827514413
Sum13303.1
Variance3.031416389
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 4.6 4.95 5.95 6.55 8.35 9.15 10.65 12.05 13.35 15.9 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
7.2 67 4.2%
 
7.1 57 3.6%
 
7.8 53 3.3%
 
7.5 52 3.3%
 
7 50 3.1%
 
7.7 49 3.1%
 
6.8 46 2.9%
 
7.6 46 2.9%
 
8.2 45 2.8%
 
7.4 44 2.8%
 
Other values (86) 1090 68.2%
 
ValueCountFrequency (%) 
4.6 1 0.1%
 
4.7 1 0.1%
 
4.9 1 0.1%
 
5 6 0.4%
 
5.1 4 0.3%
 
ValueCountFrequency (%) 
15.9 1 0.1%
 
15.6 2 0.1%
 
15.5 2 0.1%
 
15 2 0.1%
 
14.3 1 0.1%
 

volatile acidity
Real number (ℝ≥0)

Distinct count143
Unique (%)8.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5278205128
Minimum0.12
Maximum1.58
Zeros0
Zeros (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum0.12
5-th percentile0.27
Q10.39
median0.52
Q30.64
95-th percentile0.84
Maximum1.58
Range1.46
Interquartile range (IQR)0.25

Descriptive statistics

Standard deviation0.1790597042
Coefficient of variation (CV)0.3392435493
Kurtosis1.22554225
Mean0.5278205128
Median Absolute Deviation (MAD)0.1423909174
Skewness0.6715925724
Sum843.985
Variance0.03206237765
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.12 0.17 0.235 0.3075 0.3125 ... 0.6925 0.7875 0.9175 1.055 1.58 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.6 47 2.9%
 
0.5 46 2.9%
 
0.43 43 2.7%
 
0.59 39 2.4%
 
0.36 38 2.4%
 
0.58 38 2.4%
 
0.4 37 2.3%
 
0.49 35 2.2%
 
0.38 35 2.2%
 
0.39 35 2.2%
 
Other values (133) 1206 75.4%
 
ValueCountFrequency (%) 
0.12 3 0.2%
 
0.16 2 0.1%
 
0.18 10 0.6%
 
0.19 2 0.1%
 
0.2 3 0.2%
 
ValueCountFrequency (%) 
1.58 1 0.1%
 
1.33 2 0.1%
 
1.24 1 0.1%
 
1.185 1 0.1%
 
1.18 1 0.1%
 

citric acid
Real number (ℝ≥0)

ZEROS
Distinct count80
Unique (%)5.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2709756098
Minimum0
Maximum1
Zeros132
Zeros (%)8.3%
Memory size12.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.09
median0.26
Q30.42
95-th percentile0.6
Maximum1
Range1
Interquartile range (IQR)0.33

Descriptive statistics

Standard deviation0.1948011374
Coefficient of variation (CV)0.7188880858
Kurtosis-0.7889975154
Mean0.2709756098
Median Absolute Deviation (MAD)0.1646544334
Skewness0.3183372953
Sum433.29
Variance0.03794748313
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.005 0.105 0.205 0.265 ... 0.495 0.555 0.685 0.77 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 132 8.3%
 
0.49 68 4.3%
 
0.24 51 3.2%
 
0.02 50 3.1%
 
0.26 38 2.4%
 
0.1 35 2.2%
 
0.08 33 2.1%
 
0.01 33 2.1%
 
0.21 33 2.1%
 
0.32 32 2.0%
 
Other values (70) 1094 68.4%
 
ValueCountFrequency (%) 
0 132 8.3%
 
0.01 33 2.1%
 
0.02 50 3.1%
 
0.03 30 1.9%
 
0.04 29 1.8%
 
ValueCountFrequency (%) 
1 1 0.1%
 
0.79 1 0.1%
 
0.78 1 0.1%
 
0.76 3 0.2%
 
0.75 1 0.1%
 

residual sugar
Real number (ℝ≥0)

Distinct count91
Unique (%)5.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.538805503
Minimum0.9
Maximum15.5
Zeros0
Zeros (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum0.9
5-th percentile1.59
Q11.9
median2.2
Q32.6
95-th percentile5.1
Maximum15.5
Range14.6
Interquartile range (IQR)0.7

Descriptive statistics

Standard deviation1.40992806
Coefficient of variation (CV)0.5553509545
Kurtosis28.61759542
Mean2.538805503
Median Absolute Deviation (MAD)0.7640645478
Skewness4.540655426
Sum4059.55
Variance1.987897133
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0.9 1.35 1.55 1.625 1.675 ... 3.425 4.625 6.65 8.95 15.5 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2 156 9.8%
 
2.2 131 8.2%
 
1.8 129 8.1%
 
2.1 128 8.0%
 
1.9 117 7.3%
 
2.3 109 6.8%
 
2.4 86 5.4%
 
2.5 84 5.3%
 
2.6 79 4.9%
 
1.7 76 4.8%
 
Other values (81) 504 31.5%
 
ValueCountFrequency (%) 
0.9 2 0.1%
 
1.2 8 0.5%
 
1.3 5 0.3%
 
1.4 35 2.2%
 
1.5 30 1.9%
 
ValueCountFrequency (%) 
15.5 1 0.1%
 
15.4 2 0.1%
 
13.9 1 0.1%
 
13.8 2 0.1%
 
13.4 1 0.1%
 

chlorides
Real number (ℝ≥0)

Distinct count153
Unique (%)9.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.08746654159
Minimum0.012
Maximum0.611
Zeros0
Zeros (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum0.012
5-th percentile0.054
Q10.07
median0.079
Q30.09
95-th percentile0.1261
Maximum0.611
Range0.599
Interquartile range (IQR)0.02

Descriptive statistics

Standard deviation0.04706530201
Coefficient of variation (CV)0.5380949236
Kurtosis41.71578725
Mean0.08746654159
Median Absolute Deviation (MAD)0.0217734325
Skewness5.680346572
Sum139.859
Variance0.002215142653
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.012 0.0385 0.0475 0.0575 0.0655 ... 0.1245 0.242 0.4135 0.4185 0.611 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.08 66 4.1%
 
0.074 55 3.4%
 
0.078 51 3.2%
 
0.076 51 3.2%
 
0.084 49 3.1%
 
0.071 47 2.9%
 
0.077 47 2.9%
 
0.082 46 2.9%
 
0.075 45 2.8%
 
0.079 43 2.7%
 
Other values (143) 1099 68.7%
 
ValueCountFrequency (%) 
0.012 2 0.1%
 
0.034 1 0.1%
 
0.038 2 0.1%
 
0.039 4 0.3%
 
0.041 4 0.3%
 
ValueCountFrequency (%) 
0.611 1 0.1%
 
0.61 1 0.1%
 
0.467 1 0.1%
 
0.464 1 0.1%
 
0.422 1 0.1%
 

free sulfur dioxide
Real number (ℝ≥0)

Distinct count60
Unique (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.87492183
Minimum1
Maximum72
Zeros0
Zeros (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum1
5-th percentile4
Q17
median14
Q321
95-th percentile35
Maximum72
Range71
Interquartile range (IQR)14

Descriptive statistics

Standard deviation10.46015697
Coefficient of variation (CV)0.6589107704
Kurtosis2.023562046
Mean15.87492183
Median Absolute Deviation (MAD)8.187526914
Skewness1.250567293
Sum25384
Variance109.4148838
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 2.5 4.5 5.25 5.75 ... 27.5 35.5 41.5 56. 72. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
6 138 8.6%
 
5 104 6.5%
 
10 79 4.9%
 
15 78 4.9%
 
12 75 4.7%
 
7 71 4.4%
 
9 62 3.9%
 
16 61 3.8%
 
17 60 3.8%
 
11 59 3.7%
 
Other values (50) 812 50.8%
 
ValueCountFrequency (%) 
1 3 0.2%
 
2 1 0.1%
 
3 49 3.1%
 
4 41 2.6%
 
5 104 6.5%
 
ValueCountFrequency (%) 
72 1 0.1%
 
68 2 0.1%
 
66 1 0.1%
 
57 1 0.1%
 
55 2 0.1%
 

total sulfur dioxide
Real number (ℝ≥0)

Distinct count144
Unique (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.46779237
Minimum6
Maximum289
Zeros0
Zeros (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum6
5-th percentile11
Q122
median38
Q362
95-th percentile112.1
Maximum289
Range283
Interquartile range (IQR)40

Descriptive statistics

Standard deviation32.89532448
Coefficient of variation (CV)0.7079166623
Kurtosis3.809824488
Mean46.46779237
Median Absolute Deviation (MAD)25.35405297
Skewness1.515531258
Sum74302
Variance1082.102373
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 6. 9.5 28.5 49.5 68.5 92.5 113.5 154. 289. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
28 43 2.7%
 
24 36 2.3%
 
18 35 2.2%
 
15 35 2.2%
 
23 34 2.1%
 
20 33 2.1%
 
14 33 2.1%
 
31 32 2.0%
 
38 31 1.9%
 
27 30 1.9%
 
Other values (134) 1257 78.6%
 
ValueCountFrequency (%) 
6 3 0.2%
 
7 4 0.3%
 
8 14 0.9%
 
9 14 0.9%
 
10 27 1.7%
 
ValueCountFrequency (%) 
289 1 0.1%
 
278 1 0.1%
 
165 1 0.1%
 
160 1 0.1%
 
155 1 0.1%
 

density
Real number (ℝ≥0)

Distinct count436
Unique (%)27.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9967466792
Minimum0.99007
Maximum1.00369
Zeros0
Zeros (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum0.99007
5-th percentile0.993598
Q10.9956
median0.99675
Q30.997835
95-th percentile1
Maximum1.00369
Range0.01362
Interquartile range (IQR)0.002235

Descriptive statistics

Standard deviation0.001887333954
Coefficient of variation (CV)0.001893494098
Kurtosis0.9340790655
Mean0.9967466792
Median Absolute Deviation (MAD)0.001433345818
Skewness0.07128766295
Sum1593.79794
Variance3.562029453e-06
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.99007 0.99152 0.99315 0.99457 0.995495 ... 0.99884 0.999395 0.99945 1.0005 1.00369 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.9972 36 2.3%
 
0.9976 35 2.2%
 
0.9968 35 2.2%
 
0.998 29 1.8%
 
0.9962 28 1.8%
 
0.9978 26 1.6%
 
0.9964 25 1.6%
 
0.997 24 1.5%
 
0.9994 24 1.5%
 
0.9966 23 1.4%
 
Other values (426) 1314 82.2%
 
ValueCountFrequency (%) 
0.99007 2 0.1%
 
0.9902 1 0.1%
 
0.99064 2 0.1%
 
0.9908 1 0.1%
 
0.99084 1 0.1%
 
ValueCountFrequency (%) 
1.00369 2 0.1%
 
1.0032 1 0.1%
 
1.00315 3 0.2%
 
1.00289 1 0.1%
 
1.0026 2 0.1%
 

pH
Real number (ℝ≥0)

Distinct count89
Unique (%)5.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.311113196
Minimum2.74
Maximum4.01
Zeros0
Zeros (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum2.74
5-th percentile3.06
Q13.21
median3.31
Q33.4
95-th percentile3.57
Maximum4.01
Range1.27
Interquartile range (IQR)0.19

Descriptive statistics

Standard deviation0.1543864649
Coefficient of variation (CV)0.04662675535
Kurtosis0.8069425082
Mean3.311113196
Median Absolute Deviation (MAD)0.119768664
Skewness0.1936834981
Sum5294.47
Variance0.02383518055
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[2.74 2.865 2.995 3.095 3.145 ... 3.425 3.545 3.615 3.73 4.01 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
3.3 57 3.6%
 
3.36 56 3.5%
 
3.26 53 3.3%
 
3.38 48 3.0%
 
3.39 48 3.0%
 
3.29 46 2.9%
 
3.32 45 2.8%
 
3.34 43 2.7%
 
3.28 42 2.6%
 
3.35 39 2.4%
 
Other values (79) 1122 70.2%
 
ValueCountFrequency (%) 
2.74 1 0.1%
 
2.86 1 0.1%
 
2.87 1 0.1%
 
2.88 2 0.1%
 
2.89 4 0.3%
 
ValueCountFrequency (%) 
4.01 2 0.1%
 
3.9 2 0.1%
 
3.85 1 0.1%
 
3.78 2 0.1%
 
3.75 1 0.1%
 

sulphates
Real number (ℝ≥0)

Distinct count96
Unique (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.658148843
Minimum0.33
Maximum2
Zeros0
Zeros (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum0.33
5-th percentile0.47
Q10.55
median0.62
Q30.73
95-th percentile0.93
Maximum2
Range1.67
Interquartile range (IQR)0.18

Descriptive statistics

Standard deviation0.1695069796
Coefficient of variation (CV)0.2575511321
Kurtosis11.72025073
Mean0.658148843
Median Absolute Deviation (MAD)0.1190937895
Skewness2.428672354
Sum1052.38
Variance0.02873261613
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.33 0.38 0.435 0.475 0.515 ... 0.785 0.875 0.935 1.19 2. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.6 69 4.3%
 
0.58 68 4.3%
 
0.54 68 4.3%
 
0.62 61 3.8%
 
0.56 60 3.8%
 
0.57 55 3.4%
 
0.59 51 3.2%
 
0.53 51 3.2%
 
0.55 50 3.1%
 
0.63 48 3.0%
 
Other values (86) 1018 63.7%
 
ValueCountFrequency (%) 
0.33 1 0.1%
 
0.37 2 0.1%
 
0.39 6 0.4%
 
0.4 4 0.3%
 
0.42 5 0.3%
 
ValueCountFrequency (%) 
2 1 0.1%
 
1.98 1 0.1%
 
1.95 2 0.1%
 
1.62 1 0.1%
 
1.61 1 0.1%
 

alcohol
Real number (ℝ≥0)

Distinct count65
Unique (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.42298311
Minimum8.4
Maximum14.9
Zeros0
Zeros (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum8.4
5-th percentile9.2
Q19.5
median10.2
Q311.1
95-th percentile12.5
Maximum14.9
Range6.5
Interquartile range (IQR)1.6

Descriptive statistics

Standard deviation1.065667582
Coefficient of variation (CV)0.1022420904
Kurtosis0.2000293113
Mean10.42298311
Median Absolute Deviation (MAD)0.8779685631
Skewness0.8608288069
Sum16666.35
Variance1.135647395
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 8.4 8.9 9.15 9.21666667 9.275 ... 11.03333333 11.08333333 11.925 13.05 14.9 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
9.5 139 8.7%
 
9.4 103 6.4%
 
9.8 78 4.9%
 
9.2 72 4.5%
 
10.5 67 4.2%
 
10 67 4.2%
 
11 59 3.7%
 
9.3 59 3.7%
 
9.6 59 3.7%
 
9.7 54 3.4%
 
Other values (55) 842 52.7%
 
ValueCountFrequency (%) 
8.4 2 0.1%
 
8.5 1 0.1%
 
8.7 2 0.1%
 
8.8 2 0.1%
 
9 30 1.9%
 
ValueCountFrequency (%) 
14.9 1 0.1%
 
14 7 0.4%
 
13.6 4 0.3%
 
13.56666667 1 0.1%
 
13.5 1 0.1%
 

quality
Real number (ℝ≥0)

Distinct count6
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.636022514
Minimum3
Maximum8
Zeros0
Zeros (%)0.0%
Memory size12.6 KiB

Quantile statistics

Minimum3
5-th percentile5
Q15
median6
Q36
95-th percentile7
Maximum8
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8075694397
Coefficient of variation (CV)0.143287121
Kurtosis0.2967081198
Mean5.636022514
Median Absolute Deviation (MAD)0.6831779243
Skewness0.2178015755
Sum9012
Variance0.6521684
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[3. 3.5 4.5 6.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
5 681 42.6%
 
6 638 39.9%
 
7 199 12.4%
 
4 53 3.3%
 
8 18 1.1%
 
3 10 0.6%
 
ValueCountFrequency (%) 
3 10 0.6%
 
4 53 3.3%
 
5 681 42.6%
 
6 638 39.9%
 
7 199 12.4%
 
ValueCountFrequency (%) 
8 18 1.1%
 
7 199 12.4%
 
6 638 39.9%
 
5 681 42.6%
 
4 53 3.3%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
07.40.700.001.90.07611.034.00.99783.510.569.45
17.80.880.002.60.09825.067.00.99683.200.689.85
27.80.760.042.30.09215.054.00.99703.260.659.85
311.20.280.561.90.07517.060.00.99803.160.589.86
47.40.700.001.90.07611.034.00.99783.510.569.45
57.40.660.001.80.07513.040.00.99783.510.569.45
67.90.600.061.60.06915.059.00.99643.300.469.45
77.30.650.001.20.06515.021.00.99463.390.4710.07
87.80.580.022.00.0739.018.00.99683.360.579.57
97.50.500.366.10.07117.0102.00.99783.350.8010.55

Last rows

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
15896.60.7250.207.80.07329.079.00.997703.290.549.25
15906.30.5500.151.80.07726.035.00.993143.320.8211.66
15915.40.7400.091.70.08916.026.00.994023.670.5611.66
15926.30.5100.132.30.07629.040.00.995743.420.7511.06
15936.80.6200.081.90.06828.038.00.996513.420.829.56
15946.20.6000.082.00.09032.044.00.994903.450.5810.55
15955.90.5500.102.20.06239.051.00.995123.520.7611.26
15966.30.5100.132.30.07629.040.00.995743.420.7511.06
15975.90.6450.122.00.07532.044.00.995473.570.7110.25
15986.00.3100.473.60.06718.042.00.995493.390.6611.06